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Abstract 



A dynamical picture of phylogenetic evolution is given in terms of Markov models on a state space, 
comprising joint probability distributions for character types of taxonomic classes. Phylogenetic branch- 
ing is a process which augments the number of taxa under consideration, and hence the rank of the un- 
derlying joint probability state tensor. We point out the combinatorial necessity for a second-quantised, 
or Fock space setting, incorporating discrete counting labels for taxa and character types, to allow for 
a description in the number basis. Rate operators describing both time evolution without branching, 
and also phylogenetic branching events, are identified. A detailed development of these ideas is given, 
using standard transcriptions from the microscopic formulation of nonequilibrium reaction-diffusion or 
birth-death processes. These give the relations between stochastic rate matrices, the matrix elements of 
the corresponding evolution operators representing them, and the integral kernels needed to implement 
these as path integrals. The 'free' theory (without branching) is solved, and the correct trilinear 'inter- 
action' terms (representing branching events) are presented. The full model is developed in perturbation 
theory via the derivation of explicit Feynman rules which establish that the probabilities (pattern fre- 
quencies of leaf colourations) arising as matrix elements of the time evolution operator are identical with 
those computed via the standard analysis. Simple examples (phylogenetic trees with 2 or 3 leaves), are 
discussed in detail. Further implications for the work are briefly considered including the role of time 
reparametrisation covariance. 
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1 Introduction and background 



The use of Markov models of change to taxonomic character probability distributions is a standard tech- 
nique for describing mutations, and for inferring ancestral relationships between taxa. A general stochastic 
framework for phylogenetic branching models is as follows. By assumption, different 'taxonomic units' are 
identified, and classified by a set of defining characteristics: based on morphological features for example, 
or on sequence data for a particular gene or protein say. To each taxon is ascribed a probability density on 
the set of characters, and it is the task of phylogenetic reconstruction to infer ancestral relationships within 
a group of taxa, given observed pattern frequencies for characters amongst the taxa (definitions are given 
later in the text) . In such phylogenetic reconstruction, the Markov chain model describing the stochastic 
evolution of characters is extended appropriately to encompass 'branching' where the number of taxa is 
augmented as new taxonomic types evolve (for example by speciation or gene duplication), from an initial 
single progenitor, through to the final number of types under study. For details of the subject, incuding 
overviews of applications, current problems and new directions, we refer to recent textbooks, for example 

ME 

In recent work 1 2j|23l|24l[]l0 1 it has been pointed out that a fruitful approach to phylogenetic analysis is 
afforded by taking the formal perspective of multilinear tensor algebra familiar from physical systems. For 
example, in the analysis of symmetry properties (of the Markov rate matrix, and of the branching process) it 
is natural to consider continuous Lie transformation groups acting on the tensor spaces, and the associated 
representation theory |23j. Furthermore, a remarkable analogy between branching processes (where the 
technical constraint of local conditional independence |21 1 is imposed) and state entanglement in quantum 
physics has been noted |24|. In particular, for 2 characters (equivalent to single qubit (2 state) systems 
in quantum physics) the well-known log dot distance measure for 2 taxa is essentially the concurrence 
(for 2 qubits, related to the von Neumann entropy of a partial density operator); equally the tangle (an 
entanglement measure for 3 qubits) has been proposed as a useful measure of distance for 3 taxa in the 
two character case, and the analysis of its properties in the phylogenetic context has been initiated 1241 1251 . 
Further applications of classical invariant theory for phylogenetic analysis are developed in IIIOI . 

In the letter [2 1 it was argued that a further useful perspective on phylogenetics, again inspired by 
physics, can be gained by interpreting 'branching' in the model as a linear operator which augments the 
rank of the tensor corresponding to the joint probability distribution of character types (see also 1123 1 1241 1. 
In order to regard the entire model, including especially the time development represented by the branch- 
ing dynamics, in a uniform way, it is natural to seek a setting in multilinear algebra where the linear space 
describing state probabilities for taxa can be lifted to an appropriate free algebra in the sense of tensor prod- 
ucts, or 'Fock space' in physical language, so that the linear 'branching operator' has a uniform (extended) 
domain of definition. Possible interaction terms representing this operator, corresponding to phylogenetic 
branching events, can then readily be implemented in the language of second quantization as shown in 
|2 1. Although formal, the transcription to physical language provided does indeed establish that the entire 
Markov branching model can be regarded as a standard Markov chain, but with dynamics on a suitably 
extended state space - a fact not noted explicitly before. With closed form expressions for the probabilities 
in hand, it may also be possible to investigate these from various analytical viewpoints not accessible hith- 
erto. Moreover, the physical language is quite flexible, and may suggest useful insights into the models as 
well as generalisations. 

In the present paper a further step towards such analytical investigations and generalisations is taken, 
in that the second-quantised framework is transcribed into the language of path integrals. The dynamical 
quantities of interest become phylogenetic 'path' variables (or 'classical fields'), defined over a discrete 
spatial lattice. Time evolution of the system is developed in perturbation theory, yielding standard proba- 
bilities as convolutions of the appropriate kernel with the initial probability distribution, that is, as matrix 
elements of the evolution operator Similar models of reaction-diffusion or birth-death processes have been 
extensively investigated ||3|6j|18 13 1 so that there is a wealth of technical experience within this approach, 
and possibilities for generalisation. These introductory comments are supplemented in the conclusions by 
further discussion of possible applications (see summary below). 

The outline of the paper is as follows'. In Sj2]below, we give an analysis of standard accounts of phy- 
logenetic processes (as used for example in analyses for inferring ancestral trees) to justify our claim that 
a multilinear tensor description is appropriate, and equivalent to the usual approach. A standard notation 
is introduced including the branching or 'splitting' operator whose properties are discussed. In S|3]the rate 

'For the benefit of readers unfamiliar with the subject-matter, technicalities in various sections below are treated as fully as 
possible. 



operator and the branching operator are re-formulated as interaction terms in an extended time evolution 
over Fock space. Attention is given to the 'copy space' needed to identify taxa - both for the observed taxa 
(leaves) and ancestral stages ('internal' edges of the phylogenetic tree) - and it is argued that for models 
with L distinguished leaves, a 2^ -dimensional 'label' space is needed. Label summations suggest a nat- 
ural identification with the 'momentum' space for periodic functions over a hypercubic 'spatial' lattice in 
L dimensions (with 2^ nodes in the unit cell), leading to the possibility of viewing the system dually in 
'position' space. ijUgives a brief pedagogical review of standard path integral techniques as applied for the 
analysis of nonequilibrium reaction-diffusion systems in a microscopic approach. In ijslthese ingredients 
are synthesised in a path integral formulation for a 'free' phylogenetic system, that is a collection of up to L 
taxa with no phylogenetic association (not necessarily in a stationary state). It is shown that the abstract dy- 
namics, represented by the evolution kernel of the system in the path integral approach which is formulated 
and derived explicitly, does indeed make the system evolve in a standard way according to a continuous 
Markov branching process. In i|6]the question of the branching operator is resumed, and plausible interac- 
tion terms (and corresponding normal kernels) are introduced in the path integral language. It is shown in 
simple examples (trees with 2 or 3 leaves) that, in both the operator and path integral language picture, the 
probabilities arising as matrix elements from the dynamics of the model are identical to those computed in 
standard likelihood analyses for inferring phylogenetic trees. This is borne out in the appendix, §A, where 
formal Feynman rules are derived directly from perturbation theory, and which can immediately be seen 
to encode the usual sum over extended leaf colourations presentations. The conclusions, reiterate the 
main points of the paper and further implications and applications of our work are briefly discussed. In 
particular, we comment on the role of the group of time reparametrisations (diffeomorphisms), in the issue 
of assigning 'true' historical time to phylogenetic events. 



2 Tensor methods and stochastic models of phylogenetic branching 

It is usual to pose the standard stochastic model of phylogenetics by stating transition probabilities 01 
Ell^lD- It is, however, possible to present the same system in an abstract multilinear tensor setting. 
Our philosophy here is similar to that of |17| (see also |(4J). In such a formulation, the evolution of the 
phylogenetic system is represented by a group action on a tensor product space, with the branching structure 
formalized by the introduction of linear 'splitting' operators which increment the rank of the tensor space. 
As pointed out in the introduction, this basis-independent description has many advantages, prompting 
the investigation of the rich algebraic structure of the system. The door is opened to the discussion of 
symmetry groups and subgroups, representation theory and diagonalization, the differential structure of the 
rate operators and orbit classes of their action, and the ring structure of invariant functions (see li23,,24„.10J '). 

Introduce a set, K, , which consists of K discrete elements, conventionally labelled by the integers 
{0, 1, 2, K ~ 1} . Consider a system consisting of N 'samples' to each of which can be attributed one 
of K distinct characters. Associated with such a system we have the set of frequencies 

^ total number of occurrences of character a 

: = — , a = 0, 1,...,/^-!. 

In particular we are interested in the character frequencies occurring in the genome of a given taxon. The 
archetypical example is that of the DNA sequence, where the 'samples' are sites, and with four characters 
{A, G, C, T} , but it is, of course, possible to envisage the use of other character sets derived from the 
molecular data, so K is left general in this discussion. Examples include the amino acids (K = 20 ), 
codons (K = 64) or instead of nucleic acid bases themselves, a binary pyrimidine/purine Y/R classi- 
fication of them (K = 2). For practical purposes, the usual practice is to take one particular gene of an 
organism as being the representative for the taxon class, although it would be possible to sample a whole 
genome or set of genomes and calculate the character frequencies across that set, and take those frequencies 
as the representative for that taxon. Practical considerations aside, we proceed to model the time evolution 
of these frequencies stochastically. 

Introduce a random variable X which takes on values in . It is necessary to define a set of time- 
dependent probabilities which are the theoretical limit 

p"{t) ■.= P{X = a,t) = lim (2-1) 
The stochastic evolution of the probablities is assumed to satisfy the continuous time Markov property, that 



is that the state at time t depends only on the immediately preceding state at time t — 5t say, and hence 



= P{X = a, t\X = (3,t- 5t)p'^{t - 6t), (2-2) 



which in turn implies, assuming linearity and differentiability. 



In'^(t) = V lim nX^a..t + 6t\X^P,t)-S-f, ^, 
dt ^ 5t^o 

We define the (time dependent) rate matrix 



, = y lim ^" ' /ft)- (2-3) 

dV ^ ' ^ st^o St ^ ' 



R^it) . lim nX = a,t + St\X = p,t)-S'^,^ 
st^o St 

To preserve reality of the probabilities and the property J^aP^i^) ~ ^ ^'^^ ^ follows that i? is a 
real-valued zero column sum matrix. In order to preserve positivity of the probabilities it must also be the 
case that for all t 

R"p{t)>0, Vay^/?; R°'a{t)<0 (nosum). (2-5) 
For a homogeneous model the rate matrix is assumed to be time independent, with solution 

p"(t) = y M°^(i)/(0), M"^(i) = [e^*]";,, (2-6) 

P 

where exp(i?t) is calculated using matrix multiplication. 

Phylogenetics is concerned with deriving the past evolutionary relationships of multiple taxa. As al- 
ready mentioned, the modern approach is to compare the genomes of the taxa. An essential part of the 
analysis is the ability to align the genomes of distinct taxa successfully. (The possibility or otherwise 
of such alignment is not discussed here). Having ahgned the genomes it is possible to calculate pattern 
frequences. These patterns are read off 'vertically' across the aligned sequences. The data is then 

uaia2 OIL total number of occurrences of pattern Q;iQ;2...aL 

p 1 2... L _ ^ 

ai,a2, aL =0, K - 1. 

Introduce random variables Xi,X2, ■■■,X]^ each of which takes on values in the individual character 
spaces JC , and X = {X1X2 ■ ■ ■ X^) which takes on values in the L -component character space ICx ICx 
■ ■ ■ xJC. We model these pattern frequencies by again defining a set of time-dependent probabilities which 
are the theoretical limit 

P"i"^-"^(t) :=P(X = aia2...aL,t) = lim p"i"2...«i, (^)_ 

The Markov property for this system is expressed as the dependence of P(X = ai a2 . . . , t) only on its 
values at the immediately preceding time, t — St say. It is also assumed that the transition probabilities are 
conditionally independent across different taxa. This is a standard assumption l:21. „9,. ,12J and is quite well 
founded from a biological perspective. Again assuming differentiability and linearity, a solution is found 
to be 

paia....a^(^) ^ ^ Mi''^p^{t)M2''^p,{t)...ML''^p^{t)P^^^^-P^{Q). (2-7) 
/3i,/32,...,/3i, 

The final part of the model is to introduce the branching. In the case of two taxa diverging from a 
common ancestor, considering that at the time of branching the character frequencies are identical, the 
correct formula for the pattern frequencies is given by (see for example 1161 ) 

= y A.h"^p{t)M2^-p{t)pP{Q), (2-8) 
/3e/c 



as will be derived in detail below. This situation can then be iterated for the case of arbitrary trees (see 
for example f7"8l as well as the standard texts already cited). Having given the standard stochastic model 
of phylogenetics we proceed to abstract the presentation. Introduce the vector space^ V = , with 
preferred basis {eo, ei, ...,eK-i} ■ We associate the set of probabilities i2-l\ with the unique vector 

p"(i)^p(i) = ^p"(i)e„. (2-9) 

Having made this abstraction it is possible to view the stochastic evolution given by i2-6i as linear group 
action on V , clearly an appropriate one parameter subgroup of GL{K) . The structure of the Markov 
group is discussed in 1 14|, and from the viewpoint of invariant theory in |24, 25 10|. For the case of 
phylogenetics, the obvious generalisation is to the tensor product space , with group action as the 
appropriate subgroup of the direct product group GL{K)^^ . The final step is to descibe the branching 
process upon this tensor product space. 

In order to formalize this we introduce the splitting operator 6 : V ~* V ®V . Progress is made by 
simply expressing the most general action of 5 on the basis elements of V : 

(5-e„= r/%^e^, (2-10) 

a,/3,7, 

where Vj^'' are an arbitrary set of coefficients. Imposing conditional independence upon the distinct 
branches in order to constrain these coefficients, we need only consider initial probabilities of the form 

PW^^l^ 7 = 0,l,...,i^-l. (2-11) 

Consider a branching even at time t so that the initial single taxon state a small time before branching is 

P(A)W = E?'fA)Wea = JZ^aG" = (2-12) 

Directly subsequent to the branching event the 2 taxon state is therefore given by 

P(x){t) = 6 ■ p^x){t) = 5lTP/ep®ep:. (2-13) 

On the other hand, conditional independence leads to: 

V{X = aia2,t + 6t\Xi = X2 = X, t) 

P(Xi = ai,t + 5t\Xi = A, t) • P(X2 ^a2,t + 5t\X2 A, i). (2-14) 

Using the tensor formalism these transition probabilites can be expressed separately as 

9 

P(^2 = a2,t^5t\X2 = A,t) = ^M2"%-((5i)p^^)(i). (2-15) 

p' 

However, from J2-7l l we have 

¥{X = aia2,t + St\Xi = X2 = X, t) 

= Y M,'^\{St)M2"%,{6t)SlTP/ . 

Implementing ( I2-I4t . ( I2-12> and considering the limit 5t ^ Q with M"^ p{5t) S"p then leads to the 
requirement that 

Tf=Sis(, (2-16) 

^Although the above presentation involves only real numbers, we work over C to allow for the use of more convenient sym- 
metry adapted bases, or other ways of diagonalising rate matrices 1 23 . Of course, measurable quantities are referred back to the 
distinguished basis at the end of the analysis. 
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Figure 1 : The general Markov model for four taxa with tree (1((23)4)) (or (r((24)8)) in terms of binary 
edge labelling). The M 's are arbitrary transition probabihties (Markov matrices) on the designated edges. 

and the definition for the splitting operator in the preferred basis becomes simply 



Using the above notation we are now in a position to write down formally expressions for the probabil- 
ities on arbitrary trees. As an example, the expression which defines the general Markov model on the tree 
(1((23)4)) (or^ (l((24)8)))isgivenby(seefiguren 

^(r((24)8)) = {M:^®M^®M^®M^)l®5®l{l®M^®l)l®5{l®M^^5 -p (2-18) 

where p is the initial single taxon distribution and the M 's are arbitrary stochastic operators (Markov 
matrices) on the designated edges. 

3 Fock space and momentum labels for binary trees 

In the previous section we have presented a description of phylogenetic systems in terms of a multilinear 
tensor calculus based on copies of the basic state space V ~ . This comprises vectors with positive 
coefficients in the distinguished basis, corresponding to the theoretical probabilities for observation of 
a particular character a , a = . . . — 1; higher rank tensors p"ia2- represent joint probability 
densities. Moreover, we introduced a linear operator 5 : V —* V ®V , again defined by its matrix elements 
in the distinguished basis, representing phylogenetic branching viewed dynamically as an event occurring 
at a specific time. 

In this and the following sections we wish to argue for a more universal view which is appropriate for 
arbitrary trees. Given that branching might occur at various times, this means that the 'state space' might 
be anything from V (for the root edge of the tree), Xo V ®V (if there is only one branching), and so on, 
up to V ®V ®V . . . ®V , L times, if the final number of taxa (number of leaves of the tree) is L . The 
only logical way to encompass all these possibilities within one description in linear algebra is to adopt as 
the proper state space, an appropriate Fock space F associated with V , in this case for example 




(2-17) 



F^ ^(C ®V ®V®V ®---®V®V (g;)---®V 



(3-19) 



^In terms of the binary labelling introduced below, this tree is (1((24)8)) , with the remaining edge assignments determined 
additively. 



The advantage of this formal change of perspective is that it allows both the normal time evolution (as 
described above, the Markov rate operator acting on each copy of V), and the branching operator (as 
described above, 6 , or its natural extensions l(>5]l...(8)^(g)...l, simultaneously to be regarded as 
operators on F . 

In physical settings it is conventional to apply the above construction for the description of 'composite' 
systems where the state space V corresponds to a single subsystem, and the tensor products allow for 
copies of V corresponding to different numbers of subsystems. In relativistic systems this is of course the 
setting for elementary particle interactions, but the same idea is also appropriate in the nonrelativistic case. 
However, in quantum systems the Pauli principle mandates that the general spaces V (E) ■ ■ ■ <E)V are 
too big - the individual subsystems are indistinguishable in that the ordering of individual state vectors in 
the tensor product is immaterial (up to a possible sign factor for fermionic systems). This means that the 
relevant Fock spaces are technically speaking the linear spaces F'^ , F~ associated respectively with the 
symmetric (for bosons), or (for fermions) the antisymmetric or exterior tensor algebras: 

F+=C(BV®VVV®---®V\/VV---VV---, 

F- =C ® V ® V AV (B ■ ■ ■ ® V AV A - ■ ■ AV, 

= ®Lo (A"V^) (3-20) 

In adopting the machinery of Fock space to the phylogenetic context, the 'subsystems' become the in- 
dividual taxa extant at any particular stage of the branching process, and the (anti)symmetrisation principle 
would need to be interpreted as saying that all taxa are equivalent, or that the tensor probability density 
of rank n, is totally symmetric or totally antisymmetric. Thus for a given choice of observed characters 
represented by the symmetric probability density paiaa- .a^ ^ would be immaterial which taxon (from 1 
to n in this case), carried which character: 



(3-21) 



for any permutations a . In phylogenetic branching, this symmetrisation may well be appropriate for 
cases where it is suspected that a number of siblings are diverging from a common origin with equal rate 
matrices'*, but in general, we would certainly wish to be able to distinguish between taxa. 

These considerations imply that the higher rank tensor spaces V (E) V (E) ■ ■ ■ <E) V introduced above 
should be regarded technically as products of a number of labelled spaces, for example for the final L 
taxon system, Vi (S> V2 lE) ■ ■ ■ <E) Vl where each Vn is a distinct copy of V , Vn — V , n = 1, . . . , L . 
However, since the n taxon spaces required for the system at earlier times (arising from branching at 
intermediate nodes above the leaves of the tree) can comprise any subsets of the labels 1, .... L, we are 
led necessarily to a labelling system apppropriate to the power set 2^ , or simply to the well-known system 
of edge labelling for binary trees, by binary L -vectors, whereby leaf edges are labelled by powers or 
decimal equivalents 1,2^,2^, - ■ ■ 2^^^ , and the assignments for the remaining edges determined additively 
(for an example see figure|2li. 

To this end we therefore introduce the following (extended) Fock space (we discuss only the bosonic 
case in this paper): 

j^+=c©vevvv®---©vvvv---vv + ---, 

= ®:r=o (V"V); 

V:= ^ ©Vk. (3-22) 

The linear operators which can be used to construct the branching operator are defined in terms of the 
so-called creation and annihilation operators on . For G Vk , v^* G define the operators 
a^(wk) : V"V V"+iV, a{v^*) : V"V ^ V^-^V by (see for example f^) 

a'^ (wk) • Wki V Wk2 V • • • V t;k„ =i'k V Wki V Wka V . . . V i;k„ ; 

n 

a{v^*) • Wki V Wk^ V . . . V -Wk,. = S\„,v^*ivk^)vk^ V . . . Wk„ ■ • ■ V ?;k„ 

?7l — 1 



'^A similar situation may apply in the antisymmetric case, but we shall not consider it further here. 



where v denotes the omission of the corresponding vector (the dual action has been formally extended 
to be zero on differently labelled spaces, and the corresponding ^'^k^ factor displayed explicitly). The 
operators so defined should then be formally summed to give operators on the whole of ^+ (and by 
definition a^{v*) • C = 0), for which we retain the same symbol. In particular for the unit vectors eka 
and their duals e^" we define 

a^(eka) :=aL, «(e''") := a""- (3-23) 

The operators a{u*), a^v) fulfil the connmutation (ordering) relations a{u*)a^{v) — a^v)a{u*) = 
[a{u*),a^ (v)] = u*(t;)]l, where 1 is the unit operator on .F+ ; in particular for the mode operators aj^^, 

[a''",ay=6\S''(3l. (3-24) 

Moreover if we define the 'ground' state to be 1 G C , we have the algebraic means to write an arbitrary 
element of the corresponding distinguished basis in Fock space, 

ekiai V ek,a2 V • • • ek„a,. :=4iai • "'La^ ■ ■ ■ «La„ " 1- (3-25) 

In what follows it will be notationaUy more compact to introduce the so-called Dirac bra-ket notation for 
vectors in V and their duals. Thus formally we write 

1^10), 1*^(01; 
eka ^aljO) = |k,a), e*''« ^ (0|a''« = (k,a|; 

ekiai V ek^as V • • • ek„a„ —flkiai ' ^Las " ' " "Lc«„ 1^) = |klQ!l, ^2a2, ■ ■ ■ k„Q!„), 

where the latter list may include repetition. In this case the explicit notation 

|kiai,mi;k2a2,m2;---k,c.,,m,) =(al,,„, • (a^a J"' ' ' ' ("L,)"'- 10): (3-26) 

(corresponding to the so-called number basis) is occasionally mandatory. Finally we introduce the natural 
Cartesian inner product on these state vectors (with the Cka orthonormal in V ), extended to J^'^ in such 
a way that each creation and annihilation pair is mutually hermitean, and in general 

r 

(kiai, mi; k2a2, ^2; • • • k^a^, mr|li/?i, ni; I2/32, ^2; • • • n^) =5rs ^k,i,<5a,/3<, • 5m,n,rnq\ 

9=1 

(3-27) 

Although the general structure will be needed in the formalism below, sample states are in practice those 
belonging to a fixed number n of subsystems (for example n = L, the number of taxa), with (distinct) 
labelled momenta without multiplicity, of the general form 

K-l 

\P)= Yl P"^"=-""|kiai,k2a2,---k„a„). (3-28) 

Such state vectors can immediately be attributed to a theoretical probabiUty density for n taxa provided 
that the coefficients are positive and that their sum is unity. For technical reasons we introduce an auxiUary 
'reservoir' state (dual, or 'bra' vector) 

K-l 

(")(n|= ^ (kiai,k2a2,---k„a„| 
SO that this condition can be written 

K-l 

(")(rj|P) = l i > ^ paioc2-a^ ^ (3.29) 



In full generality, the auxiliary vector (allowing for multiplicities and summing over different momenta) 
becomes 



(fll :=(0|e^''^''^2 ^°=» """^ or (3-30) 
^{n\ ^(o|e^''^""2 ^°=" (3-31) 

where the latter form is convenient for notational purposes (with the understanding that Xka ^ 1 )• 

We shall be concerned with general functions / (such as the probabilities P , and below with opera- 
tors built from the creation and annihilation mode operators) which are obtained as formal sums of terms 
depending on the 'momentum' labels, say /h... . With the convention we have adopted (of scaling the k 's 
by TT ) to any such function we can associate functions over a dual space x, y • • • G by a formal Fourier 
transform. This is of course the discrete Fourier-Hadamard transformation (the phase factors are simply 
±1 ), and the functions / on 'configuration' (position) space are periodic with periods 2a for a e Z2 . In 
particular for the constant function in one variable Hk = 1 , 

1 = ^ 5(x)e-*('^-""), 



/k= ^ /(x)e-*(''-). (3-32) 



/ki= /(x,y)e-*(''''+'''', (3-33) 
and generally 

/(x + 2a,y + 2b,...) = /(x,y,...). 

As an example of the creation and annihilator formalism, let us give an operator on jr+ equivalent to 
the branching operator 5 : V ^ V (E)V introduced above (which has to be extended case by case to allow 
for branchings on particular factors of for a particular tree). Recall that the general form 

S{ec,) ^rJ'>efi(E)ej (3-34) 

was subsequently specialised to T^^^ = 5°' ^6°"^ on the basis of conditional independence. Next as- 
sume that the copies of V involved are distinguished by different labels k, 1, m so that there is no differ- 
ence between the above use of ® and the correct V as far as the symmetric algebra is concerned (below 
we shall see that the momentum labels are such that k = 1 + m). Consider then the operator A = 

Ea,/3,7ra^''a"at0a%, audits action on a state V3 \p) = p^\Q) +p^\l) . . .+p^-^\K = Y.^p^\£,) 

A|p) = r^'^Tat^at^a^^/atflO) (3-35) 

Q,/3,7 
Q,/3,7 

Thus, indeed, the requisite branching from the initial ancestral density \p) — X^qP"!*^) ^'^ '^he density 
for 2 taxa after branching, with characters shared equally ( |P) = X^qP"!'^''^) f^^" '^^e special choice 
J2-16I I of F), has been effected, and the operator A provides a generahsation of the splitting operator 



(where (5(x) = (5(x, 0)) . More generally. 
In two variables, we have in turn 



2 

(k-x+l-y) 



/kie 



S of j2-10L (|^^) suitable for representing embeddings of the latter on individual factors of the tensor 
product, as in ( 12-181 1. In Sj6]below, we return to this operator in the context of a dynamical change model 
for branching. As will be seen, it needs to be embellished by edge 'momentum' labels in order to generate 
appropriate phylogenetic trees, and also to be assigned a time dependence corresponding to the fact that 
branching events will occur at specific times in an evolutionary sense. These apparent complications need 
to be contrasted with the fact that if the splitting operator S is used in its original form, for a specific 
tree, its action on tensor products must be extended on a case-by-case basis, as in (I2-18> . In §s2]|5]below, 
we turn to a review of the path integral method for solving the time evolution of systems described in the 
operator language, and then apply the technique to a system of taxa which is 'free', that is, evolving without 
any phylogenetic association, after having developed the appropriate form for the rate operator of such a 
system. 



4 Review of path integral formalism 

In this section we review briefly the path integral formalism for the representation of the time development 
of stochastic processes whose 'microscopic' states represent probabilities of certain 'particle' numbers at 
each time. The aim of the next section will be to apply the technique to the multi-linear representation 
of taxonomic states developed in ^ and transcribed into the 'occupation number' representation in !j3] 
above. The task at hand is to transcribe the abstract occupation number representation (as developed for 
our purposes in the previous section) into a formalism of integral operators acting on generating func- 
tions representing the appropriate probability densities. This section closely follows the presentation of 
PeHtiUiJ. 

For a single system we therefore have microscopic states of the form (see ( l3-23» |n) — a^" |0) , with 
the creation and annihilation operators and a being hermitean conjugates of each other with 

a'\n} =|n + 1), a\n} = \n — 1), 

{n\m) =nlSrnn- (4-36) 
Next we make the transcription from states 

oo oo 

to a space of functions 

oo 

10) ^ 0(Z) = 

ri=0 

where the variable 'z' is a formal variable if (as in the usual statistical context) (f){z) is meant as a formal 
generating function. However for the present development it is convenient to allow z to be complex and to 
regard the (j){z) as analytic functions belonging to a Hilbert space. In terms of defining path integrals z can 
be taken to be real, or analytically continued subject to certain prescribed asymptotic behaviour (possibly 
together with constraints forcing its passage through specified points of the complex plane). 
Using the elementary identity 

n\S„,n= J dzz"{-^rS{z) (4-37) 
(which can be established by integration by parts) the scalar product ( l4-36> becomes 

= J dzcb{z)i^{~j-)6{z), or 
(M) = / '^(p{zmOe-^< . (4-38) 

Associated with the matrix elements Amn = {m\A\n) of any operator in the number basis is the integral 
kernel A{z, Q 



m,n— 



^(z,C)= V ^A„„^ (4-39) 
^ — ' to! n' 



such that 



1^) =A|0) = yMHAMH 

' ' ' ' ^ ml TiJ 



m! 

m.n 



can be expressed via "^{z) = ^ V'm^™ , with 



= / ^^(z,C)'^«')e-''^^' (4-40) 



easily estabhshed using the identity J4-37> above. Similarly the integral kernels of the product AB of two 
operators A, B is: 

AB{z, ^A^, v)B{iri', Oe-^""' (4-41) 

The integral kernel A{z, Q has a natural combinatorial connection to the normal kernel C,) where A 
is expressed in terms of creation and annihilation operators: 

oo 

A= a^"M„„a", define 

m.n— 

Then there is the simple relationship 

A{zX) =e<A{zX). 

Consider the effect of stochastic time evolution on the system. In the linear case the state probabilities 
are assumed to change according to the master equation 

= y {rni->n(t>m - rn^m4'n) (4-42) 

where Tm^n are the transition rates. It is convenient to define Rnm = fm^n and i?„„ = — X)m ^"^"i ~ 
— Rmn so that the time evolution becomes 

^4>n =Y RnrnC/jm (4-43) 
m 

with the understanding that the rate matrix satisfies ^nm = , for all m , or introducing the reservoir 
state (ilj from above, and regarding R{t) as an operator on state space which can be time dependent^ 

^m))=R(t)m)), with {n\R{t)=Q. (4-44) 
at 

With the above notation we can now develop a path integral representation for the evolution kernel of 
the system. Approximate the form of evolution operator for a small change as M^ij^gf i^ ~ e^^*^"^* , and 
for the evolution operator as a whole as a product of infinitesimal changes 

M(Tfl) ~M(j^T_g^^ ■ M)^x~St.T~2St) ■ ■ ■ M(^2St,St) ' -^^((5t,0)- 

Approximating each of the exponentials by a linear expression, and using the above relation between 
normal and integral kernels leads to 



M^t+5t,t){^, C) ^e<{l + StR{t){z, 0). 



^ This entails dipn/dt = (n|-Rl(/<)/n! , consistent with the resolution of the identity (see <4-38l above). 



Using this and iterating (14-4 l> to give a multiple integral representation of this product, assuming T — N5t , 
we have 

M(^T,Q){Z,Q- / M(^T,T-St)[Z,r]i) ^ M(T-St,T-2St)(jll,m) 

• M(2St,5t)VlN-2^'nN-l) ^ M^stfi){VN^lX) 

^ / n ■ ( E - m) + StRmvWi^m)]] ■ e^"" (4-45) 

which leads formally in the limit iV ^ oo to the path integral representation (c.f. 1181 equations (2.23), 
(2.24)) 

Mt{z, C) = y <Md\ii\ exp ^ dt (^-irlit) V (t) + iRtiW , v)) + zviT)^ ■ (4-46) 

Here the 27r factors have been incorporated into the path integral measure, and the integrations over paths 
r]{t) , ri'{t) from to T are made with the boundary conditions on each endpoint given by 

ry(0) = C, ^^/(T) = z. (4-47) 

The additional boundary term exp {zri{T)) also arises from the continuum ( TV ^ oo ) limit of the iterated 
product representation. 

It is important to point out that the path integral representation f 18' T, "61 also allows closed form 
expressions to be written down for the means (and in principle higher moments) of any desired observable 
quantities. This has not only formal significance but also, depending on the operator, opens an avenue for 
explicit analytical calculations. 



5 Evolution kernel for free phylogenetic system 

With our review of path integrals for stochastic systems in hand, we now return to the discussion of phy- 
logenetic systems in the notation of S|3l We concentrate here on the 'free' system, that is, phylogenetic 
evolution without phylogenetic branching. As we now argue, the normal kernel of the rate operator can be 
taken to be quadratic, so that the entire path integral assumes Gaussian form and admits a formal steepest 
descent evaluation. In the next section we also introduce interactions along the lines of ( 13-351 1 and indi- 
cate in simple examples which indeed reproduce the expected evolution (at least if the rate matrix is time 
independent) that this leads to the correct probabilities*. 

Transition rates have been discussed in lOlin the tensor formalism, and in ij^in introducing the path 
integral representation. However in the context of |J3] the appropriate time evolution must be the assignment 
of a rate operator to each possible edge 'momentum' label, k G irlJ^ ■ I" contrast to ( l4-42t then, in which it 
is assumed that the rates rm^n and the |m), |n) refer to li/j^ermg occupation numbers, we thus construct 
initially a number-conserving rate operator, at least inasmuch as the 'particle number' operator does not 
distinguish between the character types a = 0, 1, . . . , A' — 1 which take on the status of 'internal' degrees 
of freedom. Indeed the number operator for edge k is 

iVk = ^ aj^^ a"^" , such that 

a 

[iVu,a[„a'"3] =0 Va,/3 (5-48) 

This means of course that the rate operator must be bilinear in both creation and annihilation operators of 
type k, leading to the second-quantised expression 



*The formalism also applies to the inhomogeneous case (time-dependent rates), provided that the 'propagator' is known (see 
below). 



As mentioned in ^ above, we will be concerned with single occupation numbers for each momentum 
mode (for generalisations see the concluding remarks in ^below). Thus for a general tensor state ( l3-28> 

(c.f. m, 

I P) = R\P) 

A'-l 

= J2 ^'''^"■■^"^Iki7i,k272,---k„7n>- 

7i,72,---7ti=0 



Using the fundamental relation 



keirZ^ a,l3 

= E E«L^k",[a''^ag 

a 

where (l3-24> has been used, together with ( l3-26> . we find finally as required 

\ P) = E P |ki7i,k272, • • -knTn), where 

71 ,72,---7>i=0 

^7i72-7-> ^ ^ |^^j^Ji^p772 -7. + p^j2_^p7l7-7. + . . . R^^'r,^ ^pll'. 

(5-51) 

whence P'yi'^^-'^"(T) =^ A/t'^^^^A/t^^^ • ■ •Mt^"5„P''i''-*'-, where (5-52) 
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i5i 



Si' 



It remains to transcribe these results into the path integral notation ( l4-46> and verify that the same time 
evolution is predicted in the 'free' (Gaussian) case at least for time-independent rates. Clearly, for each 
degree of freedom k, a there is a pair of classical paths "n' {t)-^^,ri{t)^^^ or collectively simply rj' {t),ri{t) . 
From the fact that the rate operator is expressed by ( l5-49> in normal form we take 

In these circumstances the time evolution kernel is particularly simple. Explicitly 

k,Q,/3 k,, 



Mt{z'X) = / [d7?][dry']exp [ dt{-zJ2v"'''it) i^^ (t) + * E Wka^k^??^'"') + E ) ' 



(5-53) 

subject to the appropriate boundary conditions. The integration over all paths 77' (t) , imposes a functional- 6 
constraint on ri{t) , namely 

. ka _ — . , „ 

k,/3 

whence t]^" (T) = ^ MTk/3^(0)''^ (5-54) 

k,/3 

(if the transition rates are time-independent), where Airk is given by ( 15-5 U above. Thus the 77 (i) path 
integral contribution (up to a normalisation constant^) comes from the boundary term which gives using 

'For this case the discrete version can be worked out explicitly as a (multidimensional) standard Gaussian integral, and the limit 
Af — > 00 considered. In the present heuristic discussion we simply assume that the steepest descent method yields the correct result. 



Mt{z', C) =Ce^''° ^'-"^^^M^"^ (5-55) 



for some integration constant C. 

For a phylogenetic tensor state (as discussed above, with unit occupation number in momentum modes 
ki, k2, . . . , k„) we set 

K-l 

Pt{z[, z'2, . . . , z'^) = ^ paia2 ""zl^^^^zl^^^^ . . . Z^^^^. (5-56) 

Denoting the state collectively as Pt{z') we then have from (l4-40l t 

Pt{z') - y'n[^C'kJMC'^"]-^T(^',C)e~*^''-^'-'^''>o(iC')- (5-57) 

k.a 

Substituting ( 15-551 1 it is evident that the C, integrations impose a functional- 5 constraint^ ^(*C' ~ z' ■ Mx) , 
or 

Priz') ^Poiz' ■ Mt), or 

Pt{z[,z',,...,z'J^ n"^""-""^'-^Tk,k,a,^'-^Tk.k...---^'-^Tk.k„a„. Or 

Pt{z[,z',,...,z'J= ^T^""^""4,a,4.o.---4„a„, whcrc finally 

: Y Mt'"s,Mt'''s, ■ ■ ■ Mt''" s,Po'''"'" (5-58) 



as derived explicitly above. 

It is instructive to re-write the classical 'free' evolution kernel in Fourier transform space. Recalling 
the duality between 'momentum' labels k, 1 e 7rZ2^ and 'position' coordinates x, y G , define 

X y 
Rit)kfj = Y trp^^'^'''^ so that (|523} becomes 

Z 

Mt{z',0 = / [dr;][dr/]exp ( T ^ /^^(x, i) ^^(x, t) + 7y;,(x, i)i?(x - y, i)%,7"(x, t))+ 

+ ^z^(x)r,"(x,T) |. (5-59) 



6 Interaction terms and simple examples 

In this section we turn to the complete phylogenetic system incorporating 'interaction' terms. In the pre- 
vious section we constructed the 'free' part of the evolution kernel A4t = J [diT\[dri'] exp 5o[77, V] for 
the phylogenetic 'fields' ?7^(x, t) , 'ri°'{x, t) . Incorporating interactions, the kernel will acquire additional 
trilinear terms 5*1 in the exponent representing phylogenetic branching events, in such a way that the 
manifest translation symmetry in 'position' space is preserved. 

In SjSlabove, it was pointed out that the 'branching operator' 5 which was formally introduced in ( I2-17> 
of i|2]can be represented by a trilinear 2 <— 1 type operator in Fock space (compare (l3-35t above). In the 
case where there are up to L extant taxonomic units labelled by binary i -vectors (edge 'momenta') to 
allow for the development of a particular ancestral binary tree, this vertex interaction must be given definite 

^Setting the arbitrary integration constant to 1. 



momentum labels. The labelling is of course always such that an edge k G bifurcates to edges 1, m 
with k = 1+m . Moreover, if the branching is pictured as a dynamical process, the interaction must be time 
dependent. The simplest possibility is that the system for L taxa will evolve as a result of a fixed number 
M of branchings at times t/ , / = 1, . . . , M between times = (from which time some assumed 
ancestor(s) evolved) and Im+i = T (the final time of measurement). A means of forcing these events is 
via S{t — tj) interactions at times ti < ■ ■ ■ < tj < ■ ■ ■ < Im (with = to < ti and < tM+i =T). 

With the above motivations we propose the following 'interaction' term Si for the full evolution kernel 
Mt = J [d'r]][di]']exp S[r],ri'] of the phylogenetic system, where S = 5o + with 5*0 given by 
15-53l5-59> and 



As expected, the binary edge labelling is reflected in the manifest translation symmetry of this expression. 
With the complete model 5*0 + 5*1 , the path integral formalism can now be used to construct (in a pertur- 
bation expansion, see below) the evolution kernel for the full system, and hence transition probabilities for 
evolution, from any initial state to any final state. In the case of phylogenetic inference, one is of course 
interested in evolution from an initial root edge (at time t = 0) to (at time t = T) an observed joint 
probability density for character types of L taxonomic units. 

The model ( l5-53> . ( l5-59t . ( l6-60t is generic in the sense that an arbitrary (but fixed) number of branch- 
ing events AI , and any compatible branching processes for binary edges, is encoded. If probabilities For 
connected binary trees, with a single root and L leaves one should of course admit only M = L — 1 
S -function forcing terms, and adopt standard momentum labelling, for example for L 1 the root edge 
may be chosen as the binary L-vector (1, 1, 1) , and the edges the binary L-vectors (0,0,...,!), 
(0,...,1,0), (0, . . . , 1, 0, 0) , ••• (denoted below by decimal equivalents 1, 2, 4, • • • )■ For formal 
analysis with a specific tree, it may in fact be combinatorially more powerful to consider all such admissi- 
ble L ^ \ momentum routing schemes. 

For completeness, we derive in the appendix, §A, a formal perturbative expansion llSI . and give ex- 
plicit Feynman rules for the present model (see table 0. The evolution kernel for 5*0 + 5*1 is re- written 
by expanding exp(S'i) in a power series, so that the essential ingredients are specific path integrals of 
monomials in the phylogenetic path variables, weighted by the 'free' part. In turn, these moments can be 
reduced to functional derivatives of an extended 'free' kernel, with respect to ancillary 'external' path vari- 
ables coupled by additional linear terms to the path variables which are being integrated over. The extended 
kernel is again quadratic and can be evaluated as a Gaussian in terms of the formal inverse bilinear form 
or propagator with appropriate boundary conditions (see appendix §A). Moreover, the 6 -function forcing 
terms require the derivatives with respect to the external path variables to be evaluated at the interaction 
times tj . The probabilities (pattern frequencies corresponding to all binary L leaf trees with evolution 
on edges determined by the specific edge rates R\i{t) ) so constructed are identical to the usual likelihood 
calculation via extended leaf colourations for example. In the earlier second-quantized version, (see I.2J). 
the model was constructed using the canonical (creation and annihilation operator) formaUsm, and the in- 
teraction term treated in time dependent perturbation theory. We emphasize that, although well-known, the 
result in our formalism follows automatically from the time evolution kernel for the model (effectively, an 
appropriate Markov rate operator lifted to the whole Fock space), so that in this sense we have produced a 
truly dynamical model for phylogenetic branching processes. 

We illustrate our results by reiterating some concrete examples from 1 2 1 together with some further 
remarks. Consider the case L = 3 , M = 2. Nonzero rate constants are chosen for the root and leaf 
momenta 7 = (111), 1 = (001), 2 = (010) and A — (100) respectively, together with a i/ngZe additional 
momentum 6 = (110) associated with the tree T = (1(24)) of figure^ Clearly the contribution to the 
3^1 scattering probability (or likelihood) associated with this tree is, as required, the term arising (in 
the operator formalism |2|) from inserting intermediate states in the above with the correct intermediate 
edge momenta, or (in the perturbation expansion of the path integral method) from the correct linking of 
propagators and vertices at this order (see Feynman rules in appendix, §A, and table [!). Either approach 




(6-60) 



gives finally 

p^ajajaj a|4 \Pt{T)) = (ajl aj2 a^l \Mr{T, 0)| p^(0)) 

= E ^2"^^/33^4"^^/3/2^=^''Sg • . M^^^^^p-^. (6-61) 

Here | (0)) = X)p"^(0)|q^7'^) is the state representing the initial root edge probability density, and the 
Afk are the Markov transition matrices for the appropriate edges, namely A/k = e^i^^^ with Ak the time 
interval for evolution on edge k , Ak = </' — </ where the branching times at the source and target of 
edge k are t/ and tji . 

As indicated by the T subscript in ( I6-61> . the total expression for P{T) includes terms additional to 
the contribution from the selected tree. In fact without additional subtraction terms (see f5','6','T8','2'|,) the 
model as formulated is not probability conserving. However, in phylogenetic inference (for example max- 
imum likelihood analyses) it is appropriate to generate contributions from all candidate trees for unknown 
rates. In the present case the additional terms arise of course from other admissible trees. In fact, even 
if only the rate constants for edges specific to a selected tree are nonzero, there are still contributions (in 
the operator approach) from intermediate states with non-propagating momenta, and these also arise in the 
combinatorics of the path integral perturbation expansion(see below). Thus in addition to ( 16-6 1> there are 
the trees with effective trivalent nodes, 

shown in figure|3] The fWvalency comes by deleting edges for rates with i?k = and re-joining the target 
and source nodes such that there is an effective 3 point vertex corresponding to a branching operator or 
interaction vertex structure coefficient with components (compare ( l3-34t ') r^''''''' = 5a^5a'^5a^ . (Such an 
effective interaction term might also be viewed as the result of directly integrating out the r]^, r]\^ variables 
corresponding to i?k = 0). For the tree in question, the non-propagating momenta are 3 = (Oil) and 
5= (101) corresponding to the trees 7j and 7g respectively. The terms differ from one another because 
the edge evolution times T — ti and T — t2 are distributed differently over the Markov matrices Afj , Mj 
and A/j as indicated by the ' in ( l6-62t . ( l6-63t and the differing edge lengths in figure |3l Of course, it is 
always possible to regard these terms as vestigial contributions from standard binary trees with very short 
edges. In fact, since the usual counting relation between edges and leaves for binary trees obviously does 
not hold for the trivalent trees, the formal introduction of a scaling parameter would serve to distinguish 
these and similar noncanonical tree diagrams. 

7 Conclusions and discussion 

In this paper and the previous work 1 2 1 we have proposed a transcription of phylogenetic branching pro- 
cesses into the language of a stochastic dynamical system evolving according to an appropriate Markov 
rate operator on a suitably extended 'state' space. The analogy with statistical and particle physics is that 
the 'particles' in the phylogenetic context are the individual taxonomic units, and it is these which evolve 
in type and number (as in Markov models of reaction diffusion or birth-death processes, or in relativistic 
particle scattering) in the course of evolution. In 1 2 1 a conventional operator approach was taken, whereas 
in the present work the path integral formulation introduces to phylogenetics the familiar physical notions 
of 'paths' and 'fields' (over a discrete lattice). Our treatment including 'interactions' representing branch- 
ing events, including explicit Feynman rules, (tabled in appendix, §A) establishes the equivalence of the 
path integral formulation to the operator version 1 2 1 via standard perturbation theory as the appropriate tool 
for completing the transcription. 

The path integral language allows a range of techniques known in the context of the analysis of physical 
systems I18l l5ll6 1 to be deployed for phylogenetics. One immediate point is the relationship between the 
formulation of transition probabilities in 'momentum' space versus the dual 'position' space - standard 
in condensed matter systems, and also known in phylogenetics in the literature on transform techniques 
for phylogenetic inference involving the discrete Fourier Hadamard transformation, in principle to derive 
an edge rate spectrum for a phylogenetic tree directly from an observed data set of pattern probabilities 
I26II19|[T21 . In the present framework, momentum conservation is a reflection of translation invariance on 
the underlying discrete lattice. 




Figure 2: Binary labelling scheme for a tree on 3 leaves T = (1(24)) with branching events at intermediate 
times ti, t^. Nonzero rate constants for the model are chosen for the root and leaf momenta 7 = (111) , 
1 = (001), 2 = (010) and 4 = (100) respectively, together with a single additional momentum 6 = 
(110). 




Figure 3: Additional effective non-binary trees and 7g contributing to the probability in the phylo- 
genetic branching model for the three leaf case. Non-propagating momenta 3 = (Oil) and 5 = (101) 
produced by the branching interaction term at ti cause effective trivalent vertices with different evolution 
times T — ti, T — t2 on long and short leaf edges. 



General considerations for the path integral formulation include the further application of symmetry 
principles in various ways. For example, continuous Lie symmetry group actions on the path variables 
(phylogenetic fields), for example i] ^ fj - U , tj ^ U -t] can be analysed for their effect on the dependence 
of the time evolution kernel on the various rate and time parameters of the model. This has been pursued in 
1123 1 (in the explicit tensor description) for the well known Kimura 3ST model for 4 characters [15 ] where 
it was noted that the rate and branching operators intertwine the action of a U{1) x U{1) x U{1) (or 
X C^' xC^) group so that the resultant group reduction from representations of SU (4) (or SL{4) ) 
is intimately related to the properties of this model (it is well known that the Kimura 3ST model and the 
related 2P model do belong to the class of discrete colour group models |12 211 ). More generally, the Lie 
symmetry approach allows rate models to be considered in principle in terms of a hierarchy of symmetry- 
breaking terms. For example, in molecular phylogenetics at the protein mutation level, suitable symmetry 
groups would be those advocated recently in relation to the possible group-theoretical origins of the genetic 
code itself (see for example ITtI ITI). 

A deeper aspect of the branching model in the path integral formulation is the role of time reparametri- 
sations, t T{t) , in connection with notions of the 'molecular clock'. Given that 

, dt ^, . , S(t ~ tt) dt , 

then clearly the evolution kernel has the following covariance property, 

MT{ti,Rk{t))=MT{Ti.R'i,{T)), where ^^{t) = R^{t) ■ ^ and ti = T{tj) (7-64) 

CLT 

(it is assumed that dt/dr > 0, in particular r(t) is not orientation reversing). This is precisely the 
reason that, in standard probability approaches (see for example 1 12 1), 'dynamical' considerations involv- 
ing explicit time dependence can be absent - standard calculations require only the combinatorics of the 
tree (which is encoded in the present models via the branching times ti and the choice of momenta for 
which rates are nonzero). However, as has been mentioned already, there is good reason to formulate the 
branching process temporally as presented here. In order for generalisations of the S -function forcing 
interaction terms to preserve the time reparametrisation covariance (l7-64> . the introduction of an auxiliary 
phylogenetic 'gauge' field would be mandatory (as in some proper time formulations of relativistic field 
equations). 

As an illustration of this dynamical perspective, suppose now that for some edge momentum k* the 
edge rate can be written as proportional to some standard rate matrix, 

R^,(t) ^X*{t) ■ R*. (7-65) 

Then it is possible to define a phylogenetic 'proper time' r* (implicitly) as a function t , by solving the 
first order equation 

dt 1 



dr* X*{t) 

together with some suitable initial condition, for example t* j — T*{tj) = tj where tj is the branching 
time at the source node of edge k* . Then, with respect to this proper time, the edge rate i?k* {t* ) is by 
definition constant, and equal to R* . By extension, if there exists a distinguished tree path V* from the 
root to some leaf node, along which all edge rate matrices possess the above multiplier property ( l7-65> . 
a global phylogenetic proper time r* exists for that tree path, with the rate matrices piecewise constant 
(constant along each edge). Finally, such a tree path phylogenetic proper time may always be adjusted to 
coincide with geological or archaeological time determinations at certain points by piecewise linear affine 
transformation(s) of the form r* ar* + b (which may be edge dependent along the distinguished path) 
without compromising the above arguments. An extreme example of this situation is of course the case 
of a stationary Markov process, wherein each rate matrix is (proportional to) a given fixed matrix R , and 
the (weighted) sum of elapsed times along each tree path from the root to a leaf node, is constant - in this 
case a molecular clock exists in the strongest sense. As usual however, it is still impossible to disentan- 
gle evolution occurring on some edges with standard strength for time At , from evolution occurring over 
time XAt with scaled rates {X~^)R. In general, conclusions drawn from studies of 'time dependent' rate 
matrices should always be treated with caution because of reparametrisation covariance. Related consid- 
erations for general Kolmogorov equations, related to non-stationary finite Markov processes, but without 



explicit recognition of the role of the group of time reparametrisations (diffeomorphisms), have been given 
in I20II : for a discussion of general time-dependent Markov processes see f281. The 'intrinsic time' of 1201 
is nothing but the above phylogenetic 'proper time' r . This, in turn - interpreted as a gauge fixing choice 
- is essentially the Teichmiiller parameter for the configuration space of an implicit 'einbein' path variable 
which carries gauge transformations associated with the group of time reparametrisations on the interval 
[0, T] (see for example LI 1 1). 

Within the present reformulation it is also possible to examine generaUsations which may not be appar- 
ent in other contexts. An example would be analytical or at least systematic possibilities for the examination 
of the behaviour random trees in the limit of very large numbers of leaves, or of random branching events, 
for the purpose of comparative evolution studies. A further extension would be to include population 
processes such as mutation-selection effects into the models. 

A final potentially important analytical tool is the fact that (as mentioned above) the closed form ex- 
pression for the scattering probabilities represented by the evolution kernel generates contributions from 
all candidate trees for a given number of leaves. It is clear from our presentation that the characteristics of 
a specific tree can be encoded via the choice of nonzero rate constants for particular edge momenta, and 
that there may be several equivalent such assignments amongst the 2^ admissible binary L -vectors. The 
exploitation of the interrelations of these assignments might give insights into the derivation of 'invariants' 
(in this case for the combinatorics of trees, rather than for differential topology, as in the case of topologi- 
cal quantum field theory) which could provide useful constraints in phylogenetic inference and maximum 
likelihood 'optimal' tree searches. Indeed, in maximum likelihood approaches themselves, it may be useful 
to have a formal representation of all contributing terms, without the need for explicit tree enumerations. 
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A Feynman rules for phylogenetic branching models 



In this appendix we develop systematic expansion methods in the form of Feynman rules, for the calcula- 
tion of the time evolution of state probabilities in the model given by ( l5-53> and ( l5-59> . This establishes that 
the model is formally equivalent to the standard prescription for calculating likelihood functions for phy- 
logenetic trees, and provides the justification for the more qualitative discussion of the free and interacting 
cases given in §[|5land|6labove. 

Firstly note that the path integral representation of the (free) time evolution kernel A4^{z, () , ( l4-46> . 
( l5-53> and ( l5-59> can be written in various equivalent symmetrised forms emphasising the role of the 
boundary conditions, namely (using the generic form ( l4-46> to suppress affixes)'^, 

dt (~iii'{t) V (t) + RSil', 11)) + zr^{T) = dt [+1 i'it)rjit) + i^^(^r;^ r;)) + nj'iO)C 

dt (^-i\{7i'{t) V (t)- h'{t)r^{t)) + Rt{ir^', ij)^ + izr,(T) + Uij'{0)C. 

(A-1) 



Now consider the complete time evolution kernel extended by some ancillary path variables i£,'{t), £,{t) , 



Ml 



d[ri']d[r]] exp {irj' ■ K ■ rj + i^' ■ rj + irj' ■ ^) exp(+z77(T)) exp Si[ir]' , rj], 



(A-2) 



such that Mt ^ — ^ ^ Mt ■ The notation ' • ' in the exponential represents a definite integral of the occurring 
path variables with respect to time from t ~ to t — T , respecting of course the boundary conditions 
derived earlier, (l4-47> "'. The notation ' i-q' ■ K -rj' refers to the quadratic part of the integrand, in this case 
in the first of the forms (IA-1> . Finally an additional (for the moment generic) 'interaction' term is included, 
with 5*1 being the integral of the normal kernel. 

The aim is to consider the convolution of Mt with the initial state probability generating function, in 
such a way that the the expansion of the exponential of the interaction in a power series, together with the 
final state matrix element, and the folding with respect to the initial state probability tensor, are all reduced 
to formal derivatives with respect to the ancillary variables, acting on the expression for the 'free' kernel. 



Mx{z, C) := / d[ri']d[r]] exp {irj' ■ K ■ rj + ■ rj + irj' ■ exp{+zri{T)). 



(A-3) 



To this end consider the complete generating function for the final probability state vector (compare 
4Qj), 



Pt{z) = I dCdCMT{zX)e-'^'^P^{0 



(A-4) 



The additional 5*1 [r], irj'] interaction term in the exponential can be regarded, after a power series expan- 
sion, as a series of moments evaluated on the free kernel, so that 



Pt{z)= e ' di£,"d£, . I dCdC MUzX)e-"^''^ Po{C') 



(A-5) 



Also if we are interested in a final state consisting of L taxonomic units, the relevant probability component 
is by definition the generating function derivative with respect to the appropriate z variables; for example 



paiki...Qi,k£ 



d 



d 



-Pt(z) 



(A-6) 



2=0 



'Using either of the second two forms in the discussion following 15-531 leads to equivalent solutions, for example 
iv'^.c.iT)=i7l'^,0mM^T'^f^, and then iv' m = iv'^^emM^T? 

as before. 

'"Bearing in mind that the additional boundary contributions are for specific times, and are thus products, not integrals. 



From the dependence of the kernel on z (the first form in JA-1» . however, the derivatives merely bring 
down factors of r/(T) with appropriate labels, which in turn are equivalent to the corresponding differen- 
tiations with respect to i^' : 



d d 



(A-7) 



Finally, from the second form of the kernel in JA-1> . the path integral over will enforce a 5 -function 
constraint identifying iC,' with irj'{Q) , or partial differentiation with respect to the appropriate components 
of 



d d 



d 



d 



d^i"dO d^C.^ST) "\dm 



d 



■M°t{zX) 



(A-8) 



Turning to the evaluation of M?p{z,C,) itself, note that the quadratic part of the integrand in ( IA-2> can 
be written 



dtdt'iii{t') ■ K{t, t')ri{t) =11 dtdt'ir]'{t){-dtS{t - t') + R5{t - t'))r]{t'). 



(A-9) 

J Jo 

The formal completion of the square 

ir]' ■ K -rj + i^' -7] + i7]' ■ e = i{ri' + S,' R-^) ■ K ■ (rj + R-^O - • ■ C (A-10) 

suggests integrating out the resulting Gaussian after the change of variables iij' i{rf + ^'K^^) , 77 
{rj + K~^£^) , (which has unit Jacobian), leaving the expression 



(A-11) 



up to normalisation factors ( including det K ^ ) and boundary contributions. However, the explicit de- 
pendence on z and C, (which is to be integrated over in (IA-4> . ( IA-5» has been circumvented by the device 



of formally introducing appropriate differentiations with respect to the ^ variables, so that (lA- 11 > nor- 
malised with reference to the noninteracting case, is sufficient provided that K^^ is calculable. For the 
case of R constant this is easily checked to be 



(A-12) 



subject to K~^{t, t') = if t < t' . 

Consider then the noninteracting case (IA-8> . ( IA-12> with 5*1 = . Clearly the necessity to set the 
ancillary variables equal to zero after differentiation means that the only viable initial probability state 
vector is one also with L extant taxa, and with identical momentum labels. Explicitly we have 



d 



d 



Pr 



,/3iki ■■■/3i,ki 



d 



d 







exp // dtdt'e{t-t') J2 *^;m(OM(Vt')m/""(i') 
^ 7,(5,m 



(A- 13) 



Differentiations with respect to i^' , ^ with the corresponding momentum labels must be paired, leading 
finally to 

Pt - ^MtIci 0i^WTk2 ^jj-'-J^Tki, p^^o (A- 14) 



as was derived informally in (l5-52> . J5-58> . 



Turning to the interacting case, we are interested in the final state probability for L taxonomic units, 
assigned momenta ki , k2 , • • • say, arising from an initial state with one taxon (the root) with mo- 
mentum ko , thus the probability component for a L ^ 1 scattering process in the model. Once again, the 
necessity to set the ancillary variables equal to zero after differentiation selects nonvanishing contributions 
corresponding to precisely degree i — 1 in the power series expansion of the exponential of the interaction 
term Si[^, ^] (see (ED and dfr^ l: 



jQiki-Qiki 



1 



(L-1)! 
d 



/ k.l,m ^ 



(L-l) 



Pr 



/3oko 



d 



exp - 



If dtdt'e{t~t') j2 ^^;mWAf(V.om/""(i') 



7,(5,m 



(A-15) 



It is convenient at this stage also to choose canonical momenta (binary L -vectors, with a scaling of tt 
understood) ko — (1, 1, . . . , 1) for the root, and ki = (0, 0, . . . , 1) , k2 = (0, . . . , 1, 0) , ■ • • k2 = 
(0, . . . , 0, 1) for the edges (or decimal equivalents 1 , 2 , 4 , • • • )■ For formal analysis with a specific 
tree, it may in fact be more powerful to consider all such admissible L ^ I momentum routing schemes, 
however for combinatorial purposes any fixed assigment is sufficient. 

For L = 2 there is only one interaction, whose time is forced to be t = ti . Performing the differenti- 
ation of the exponential of the free kernel with respect to ^'^q'^q (0) gives 



pQikia2k2 



\ J2 ^(k-l-m) 



d 



9 pa/3 d 



k,l,] 



d 



d 



+ / C?^«eAko(0A^tk,//3o 



P^°''° ■ exp - 



7,(5,m 



4=0=?' 



and carrying out the two remaining ^ differentiations leads to 



Pr 



aikia2k2 



-\ J2 <5(k-l-m)r"^ 



k,l,m 



5^C,k,m d^^U.^T) dz^i^ih) 



ti 



exp - 



+ / rfi*Cko(0Aftko''/3o 



Jti 

[[ dtdt'e{t-t') V 



^Aiko 



For a nonzero result the remaining ^'(ii) and two ^'(T) differentiations can only be applied to the terms 
standing in front of the exponential. Moreover, the implicit 6 terms require the ^'(^i) differentiation to 
be applied only to the kg integral, thus fixing k = ko . Finally since ko = ki + k2 = 1 + m there 
are two equivalent ways to apply the remaining differentiations (cancelling the symmetry factor ^ in the 
interaction term) giving finally 



pQikia2k2 



.00^0 



(A-16) 



In the general case systematic diagrammatical rules (Feynman rules) can easily be ascribed and tabulated 
for the evaluation of ( IA-15t . On the basis of the above L — 2 (first order) case and similar considerations 
for L = 3 (second order), all possible probability component contributions for L taxa are constructed 
from the matrix element for L ^ 1 scattering as follows: 



Element 



Term 



internal edge 



ti 



ti' 



root 



--O/Jko 



leaf 



tj - n /3k. 
T -0«k, 



vertex 




r"-0,0„5(k-l-m) 



Table 1: Feynman rules for evaluating probabilities for L ^ 1 scattering in phylogenetic branching 
model. Trees are a combination of labelled graphical elements. Each tree contributes a term to the total 
likelihood or pattern frequency. (AfAk)"^ is t the Markov transition matrix for edge k and time interval 
(edge length) A, and F"^^/ is the vertex structure coefficient ( = (5"^(5"^' ). See concluding remarks for 
comments on the role of the group of time reparametrisations. 



Feynman rules for phylogenetic trees 

1 Diagrams consist of 2L—1 directed edges, L — 1 vertices with internal nodes, one external root and 
L leaf nodes; 

2 To each element is assigned character and momentum labels as in table [fl specifically, 

3 Root and leaf edge momenta are assigned canonical binary L -vectors (see text); momentum conser- 
vation between ingoing and outgoing edge momenta is imposed. 

4 Vertices (internal nodes) are assigned interaction times ordered ti < t2 < ■ ■ • ^l-i ; 

5 The root node is assigned time t = Q = , and the leaves are assigned time t = T = . 
To these labelled diagrammatical elements, the following algebraic terms are associated: 

6 For each directed edge, a Markov transition matrix for time interval A ~ [tp —tj), Q < I < L — 1 
between the target and source nodes, and for its assigned edge momentum, and matrix element 
determined by the source and target character labels assigned (see table^; 

7 To each vertex, an appropriate component of the F structure coefficient (see tabled; 

8 Consistent combinations of these elements, summed over all internal momenta and character indices, 
with valid momentum conservation, correspond to contributions from all possible labelled L leaf 
binary trees. 

Using these rules, likelihoods can thus be written down autonomously and diagrammatically, without 
reference to the path integral context; however as stressed in the main text, the utility of the path inte- 
gral formulation is precisely to provide a self-contained prescription for them without the explicit need to 
enumerate trees. 
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